The roles of expectation, comparator, administration route, and population in open-label placebo effects: a network meta-analysis

Three meta-analyses have demonstrated the clinical potential of open-label placebos (OLPs). However, there is a need to synthesize the existing evidence through more complex analyses that would make it possible to answer questions beyond mere efficacy. Such analyses would serve to improve the understanding of why and under what circumstances OLPs work (e.g., depending on induced expectations or across different control groups). To answer these questions, we conducted the first network meta-analyses in the field of OLPs. Our analyses revealed that OLPs could be beneficial in comparison to no treatment in nonclinical (12 trials; 1015 participants) and clinical populations (25 trials; 2006 participants). Positive treatment expectations were found to be important for OLPs to work. Also, OLP effects can vary depending on the comparator used. While the kind of administration route had no substantial impact on the OLP effects, effects were found to be larger in clinical populations than in nonclinical populations. These results suggest that the expectation, comparator, administration route, and population should be considered when designing and interpreting OLP studies.


. GRADE Ratings for each network
We used the Grading of Recommendations Assessment, Development, and Evaluation ratings (GRADE 1 ) and the corresponding web application to apply this framework 2,3 . The certainty of evidence for each network estimate was assessed according to the following criteria:

Study limitations (Within study bias):
The overall risk of bias of each study was categorized. According to the Cochrane Risk of Bias tool 2 4 , we rated five risk of bias domains. We then used the contribution matrix to calculate the percentage of contribution from each study, and finally assessed the study limitation for each network estimate based on the weighted average risk of bias of the contributing studies. We selected the rule "Average Risk of Bias" in order to calculate the within study bias.

Reporting bias (Across studies bias):
Since each of our comparisons had less than 10 comparisons, we could not use the ROB-MEN 5 tool to assess reporting bias. Therefore, a comparison-adjusted funnel plot with accompanying Egger test for asymmetry was conducted and used as a basis for the judgment.

Indirectness:
We judged that there was no concern in this domain as the included studies matched our inclusion criteria and study questions.

Imprecision:
In line with previous analyses 6 , we considered a clinically meaningful threshold for standardized mean difference (SMD) to be 0.20.

Heterogeneity:
We evaluated the degree of concerns through comparing the clinical inference based on the 95% confidence intervals (CI), the latter reflecting the degree of heterogeneity. Appling the same clinical inference framework as for imprecision, we saw no concerns in heterogeneity when the two judgements matched (e.g. no concern based on 95% CI and no concern based on 95% PI), some concerns when they differed by one degree (e.g. no concern based on 95% CI but some concerns based on 95% PI), and major concerns when they differed by two degrees (e.g. no concern based on 95% CI but major concerns based on 95% PI).

Clinical network
We found some concerns for within-study bias (i.e., study limitations) for most pairwise comparisons, due to the nature of the studies being unblind and most outcomes being self-reported. In terms of the across-study bias (i.e., reporting bias), the Egger test for funnel plot asymmetry was significant (p = .036) indicating that reporting bias is a threat to the network meta-analysis. There was no concern for indirectness, since the included studies all matched our study questions. Evaluating imprecision, we found that all statistically significant comparisons revealed a clinically significant effect size, except for two comparisons (cOLP suspension vs. DP, cOLP suspension vs. OLP-) where we found major concerns regarding the clinical significance of observed effects. Furthermore, we examine heterogeneity, which is represented by the 95% prediction interval for each individual comparison. For three statistically significant comparisons (TAU vs. cOLP pills, NT vs. cOLP pills, OLP-vs. OLP pills) there were some concerns regarding heterogeneity, indicating that there is some variability of effects. All other significant comparisons revealed no concerns. Furthermore, we found no evidence for substantial and statistically significant heterogeneity in the network as a whole (within design Q = 12.62, p = .557, tau2 = 0.024; I2 = 26.5%). Finally, there was evidence of incoherence between the direct and indirect evidence in three comparisons, i.e., cOLP suspension vs. OLP-, cOLP suspension vs. DP, DP vs. OLP-. For those comparisons where only indirect evidence was available incoherence was set to major concerns. Also, we identified evidence of inconsistency in the NMA when calculating the global design-by-treatment interaction test (between designs Q = 11.86, p = .018). Eligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. Clearly describe eligible treatments included in the treatment network, and note whether any have been clustered or merged into the same node (with justification). p.5-6 Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched. p.5 Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.
p.5 eAppendix 1 Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis).

p.5-6
Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. p.6-7 Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. p.5-7 Geometry of the network S1 Describe methods used to explore the geometry of the treatment network under study and potential biases related to it. This should include how the evidence base has been graphically summarized for presentation, and what characteristics were compiled and used to describe the evidence base to readers. p.7-9 Risk of bias within individual studies 12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis.

Summary of evidence 24
Summarize the main findings, including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy-makers). p.12-14 Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review level (e.g., incomplete retrieval of identified research, reporting bias). Comment on the validity of the assumptions, such as transitivity and consistency. Comment on any concerns regarding network geometry (e.g., avoidance of certain comparisons). p.14-15 Conclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future research.

Funding 27
Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review. This should also include information regarding whether funding has been received from manufacturers of treatments in the network and/or whether some of the authors are content experts with professional conflicts of interest that could affect use of treatments in the network.
p.17 PICOS = population, intervention, comparators, outcomes, study design. * Text in italics indicate wording specific to reporting of network meta-analyses that has been added to guidance from the PRISMA statement. † Authors may wish to plan for use of appendices to present all relevant information in full detail for items in this section.

Adverse events
Regarding adverse events, it is remarkable that few studies reported adverse events systematically or at all. In total, 15 of the 37 studies made a statement regarding adverse events. From these reports, it is apparent that relatively few adverse events occur in the context of OLP treatment. This suggests that OLP is a safe and mostly side effect free treatment. However, due to inconsistent or unreported adverse events, it is difficult to draw a conclusion.

Certainty of the evidence
The certainty of evidence for the network estimates of both samples was examined by using GRADE. The results for study limitations (within study bias), reporting bias (across-studies bias), indirectness, imprecision, heterogeneity, and incoherence can be found in the supplement (eAppendix 3-4, eFigure 2).

Sensitivity analysis
To investigate the impact of high risk studies, we conducted the analyses including only studies in which the risk of bias was low or moderate. In each sample, one study was high risk of bias and thus excluded and compared to the whole sample. The results in the nonclinical network remained unchanged in principal, solely OLP nasal changed from being marginally significant to insignificant. In the nonclinical sample, cOLP pills moved from being significant to non significant, as only one study with a cOLP pills group remained in the network. Otherwise results and heterogeneity measures remained comparable.
To investigate the impact of including studies with subclinical populations within the clinical sample, we conducted a sensitivity analysis by excluding studies with subclinical samples. In principle, the results remained unchanged with a trend for slightly bigger effect sizes when subclinical studies were excluded (see eFigure 3-6 in the supplement for the results of sensitivity analyses). Surprisingly, heterogeneity increased from I 2 = 26.5% (clinical all) to I 2 = 32.6% (clinical without subclinical).
Furthermore, owing to the great variance of included conditions within each of the two networks, we performed subgroup analysis for two broad areas: pain (i.e., chronic back pain, experimental pain, irritable bowel syndrome, knee ostheoarthritis) and psychological (i.e., depression, fatigue, conditions, well-being, insomnia, test anxiety, sadness, relaxation, stress). The results for the clinical pain network (11 studies) showed comparable results to the ones of the whole network, except the treatment programme changed to being significantly better than NT, whereas OLP-moved to being significantly worse than NT. Interestingly, heterogeneity was reduced from I 2 = 26.5% (clinical all) to I 2 = 0% (clinical pain). Within the nonclinical pain sample (N = 4), results did also change only marginally, with OLP nasal not being significantly better than NT anymore. Heterogeneity as well decreased from I 2 = 66% (nonclinical all) to I 2 = 51.7% (nonclinical pain). Within the psychological subsamples results could in general also be replicated    Note. Column headers are identical to row headers. Cells contain the network estimates (SMDs) from network meta-analysis (direct and indirect evidence) in the lower triangle and the direct treatment estimates (SMDs) from pairwise comparisons in the upper triangle. Comparisons considered for RQ1 (expectation) are marked with a *, for RQ2 (comparator) marked with a °, for RQ3 (modalities) marked with a + . Legend: cOLP, conditioned Open-Label Placebo; DP, Deceptive Placebo; NT, No Treatment; OLP, Open-Label Placebo with rationale; OLP-, Open-Label Placebo without expectation induction; TAU, Treatment as Usual; WL, Wait List.